Methods and systems for efficiently integrating a cryptographic co-processor

ABSTRACT

A method and system of processing a cryptographic packet includes receiving a first cryptographic packet in a host CPU. A first set of data required to execute the first cryptographic packet is identified. The first cryptographic packet and the required first set of data is transferred to a cryptographic co-processor. The first cryptographic packet is executed in the cryptographic co-processor. The host CPU is notified that the execution of the first cryptographic packet is complete. The executed first cryptographic packet is received in the host CPU.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] This application is related to commonly owned U.S. patentapplication Ser. No. 10/273,718 filed on Oct. 18, 2002 and entitled“Stream Processor with Cryptographic Co-Processor,” which isincorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to microprocessors, andmore particularly, to systems and methods for a microprocessor toefficiently integrate operations with an on-die co-processor.

[0004] 2. Description of the Related Art

[0005] Microprocessors can often include both a central processing unit(CPU) and a specialty co-processor on one die. The specialtyco-processor can perform any type of operation to assist the CPU torapidly process the required data. FIG. 1A shows an exemplarymicroprocessor die 100 that includes a CPU 110 and a co-processor 120.The co-processor 120 can be a cryptographic co-processor. Thecryptographic processor 120 can be included on the same die 100 as theCPU 110 because a cryptographic operation is a relatively complex andtime-consuming process. Therefore having the cryptographic co-processor120 on the same die 100 with the CPU 110 can allow for fastercryptographic operations as compared to having the cryptographicco-processor external to (e.g., peripheral) the CPU die 100.

[0006]FIG. 1B is a flowchart of the method operations 140 for thetypical CPU 110 and cryptographic co-processor 120 to process acryptographic operation request. FIG. 1C is a graphical representationof a time line 180 for processing the same cryptographic operationrequest. In operation 142, the CPU 110 receives an operation requestsuch as a data packet. In operation 144, the CPU 110 identifies thereceived request as a crypto operation request. By way of example anIPsec encrypted packet can be received in the CPU 110. Software in theCPU can identify the packet as an IPsec encrypted packet. In operation146, the CPU sends the IPsec packet to the crypto co-processor 120 to beprocessed.

[0007] Referring now to both FIGS. 1B and 1C, operations 142-144 occurbetween time T₀ and time T₁, operation 146 occurs at time T₁. Betweentimes T₀ and T₁, the crypto co-processor 120 is sitting idle (i.e.,stalled) waiting for a crypto operation request to be transferred to thecrypto co-processor. At time T₁, in operation 146, once the cryptooperation request is transferred to the crypto co-processor 120, the CPUretrieves and begins processing a subsequent operation request orrequests in operation 148.

[0008] Between time T₁ and time T₂, the crypto co-processor 120processes the crypto operation request such as in operations 150-152. Inoperation 150, the crypto co-processor 120 identifies the data requiredto execute the crypto operation. By way of example, a requireddecryption key may be identified. Unfortunately, the crypto co-processor120 cannot access the required decryption key because it does not knowwhere the key is located. Further, the crypto co-processor 120 cannotdirectly access the memory. In operation 152, at time T₂, the cryptoco-processor 120 sends a request for the identified data to the CPU 110.The crypto co-processor 120 then stalls because processing the singlecrypto operation request cannot continue until the required decryptionkey is received by the crypto co-processor.

[0009] In operation 154, the CPU 110 interrupts the operation requestthen currently being processed. Alternatively, the CPU 110 can waituntil the then current operation request is completed. In operation 156,the CPU 110 retrieves the identified data (e.g., the decryption key). Inoperation 158, the CPU 110 provides the identified data to the cryptoco-processor 120. Once the identified data is provided to the cryptoco-processor 120, the CPU 110 can resume the interrupted operationrequest or alternatively retrieve a subsequent operation request. If thesubsequent operation request is identified as another crypto operationrequest, the CPU 110 may stall waiting for the crypto co-processor to beavailable to execute the subsequent crypto operation request.

[0010] At time T₃, in operations 160 and 162, the crypto co-processor120 resumes processing the crypto operation request and completes thecrypto operation request. By way of example, the crypto co-processor candecrypt the crypto operation request to produce a decryption result.

[0011] In operation 164, at time T₄, the crypto co-processor 120notifies the CPU 110 that the current crypto operation request has beencompleted. The crypto co-processor 120 then stalls until the CPU 110requests the result of the completed crypto operation request. Inoperation 166, the CPU 110 interrupts (or alternatively completes) thethen current operation request before responding to the completed noticefrom the crypto co-processor 120 at time T₅.

[0012] Operations continue in similar theme as subsequent cryptooperation requests are received in the CPU 110 and passed to the cryptoco-processor 120 for execution. The above-described method operations142-166 are very inefficient because the crypto co-processor 120 isoften stalled waiting for the necessary data to complete a cryptooperation request. Further, the constant interruptions of the CPU 110 bythe crypto co-processor 120 reduce the efficiency of processing in theCPU. Further still, if the CPU 110 retrieves multiple crypto operationrequests in short succession (e.g., before the crypto co-processor 120has completed the previous crypto operation request), the CPU may stallwaiting for the crypto co-processor to become available to execute asubsequent crypto operation request.

[0013] In the past, these shortcomings have been addressed in numerousapproaches. One approach has been to increase the speed (i.e.,frequency) of the data bus (e.g., bandwidth) between the CPU 110 and thecrypto co-processor 120. Including both the CPU and crypto co-processorto the same die 100 has also reduced some delay times and increasedthroughput somewhat. Another approach has been to simply drive theprocessing speed (e.g., clock speed) of the CPU and crypto co-processorever faster. However, while each of these approaches failed to addressthe fundamental problem of an inefficient system and method ofcommunication between the CPU 110 and the crypto co-processor 120. Inview of the foregoing, there is a need for a system and method toprovide improved communication efficiency between the CPU 110 and thecrypto co-processor 120.

SUMMARY OF THE INVENTION

[0014] Broadly speaking, the present invention fills these needs byproviding an improved system and method to provide improvedcommunication efficiency between the CPU and the crypto co-processor. Itshould be appreciated that the present invention can be implemented innumerous ways, including as a process, an apparatus, a system, computerreadable media, or a device. Several inventive embodiments of thepresent invention are described below.

[0015] One embodiment includes a method processing a cryptographicpacket includes receiving a first cryptographic packet in a host CPU. Afirst set of data required to execute the first cryptographic packet isidentified. The first cryptographic packet and the required first set ofdata are transferred to a cryptographic co-processor. The firstcryptographic packet is executed in the cryptographic co-processor. Thehost CPU is notified that the execution of the first cryptographicpacket is complete. The executed first cryptographic packet is receivedin the host CPU.

[0016] Identifying the first set of data required to execute the firstcryptographic packet can also include identifying the required first setof data in a first control word. The control word can includeinstructions for the crypto co-processor.

[0017] Transferring the first cryptographic packet and the requiredfirst set of data to the cryptographic co-processor can includetransferring the first cryptographic packet and the required first setof data through a control queue. Identifying the first set of datarequired to execute the first cryptographic packet can includeidentifying the required first set of data in a first control word. Thefirst control word can be located in the control queue. The firstcontrol word can also identify a first storage location of the firstcryptographic packet and a second storage location of the required firstset of data.

[0018] Notifying the host CPU that the execution of the firstcryptographic packet is complete can include modifying a field in thefirst control word. Modifying the field in the first control word caninclude identifying a third location of an execution result of theexecuted first cryptographic packet. Receiving the executed firstcryptographic packet in the host CPU can also include the host CPUretrieving the execution result from the third location identified bythe first control word.

[0019] Transferring the first cryptographic packet and the requiredfirst set of data to the cryptographic co-processor can includereceiving a subsequent packet in the host CPU and executing thesubsequent packet in the host CPU. If the subsequent packet is a secondcryptographic packet, then executing the subsequent crypto packet in thehost CPU can include identifying a second set of data required toexecute the second cryptographic packet, transferring the secondcryptographic packet and the required second set of data to thecryptographic co-processor, executing the second cryptographic packet inthe cryptographic co-processor, notifying the host CPU that theexecution of the second cryptographic packet is complete, and receivingthe executed second cryptographic packet in the host CPU.

[0020] The second cryptographic packet can be executed in thecryptographic co-processor substantially in parallel with executing thefirst cryptographic packet. The second cryptographic packet can beexecuted in the cryptographic co-processor in series with executing thefirst cryptographic packet.

[0021] If the subsequent packet is a second cryptographic packet, thenexecuting the subsequent packet in the host CPU can include identifyinga second set of data required to execute the second cryptographicpacket, transferring the second cryptographic packet and the requiredsecond set of data to the cryptographic co-processor via an interface,executing the second cryptographic packet in the cryptographicco-processor substantially in parallel with executing the firstcryptographic packet, notifying the host CPU that the execution of thesecond cryptographic packet is complete, and receiving the executedsecond cryptographic packet in the host CPU.

[0022] Another embodiment includes a microprocessor that includes a hostCPU, a cryptographic co-processor, and a control queue coupled to thehost CPU and the cryptographic co-processor. The cryptographicco-processor can include multiple hardware units and at least onesoftware component. The multiple hardware units can include one or morecrypto units that are optimized to perform a selected encryptionprocess.

[0023] The control queue can be a storage location in themicroprocessor. An interface coupled between the host CPU and thecryptographic co-processor can also be included. The interface iscapable of transferring an instruction from the host CPU to thecryptographic co-processor. The interface can be a set of hardwareregisters.

[0024] Another embodiment includes a method of processing acryptographic packet. The method includes receiving a firstcryptographic packet in a host CPU. A first set of data required toexecute the first cryptographic packet is identified in a first controlword. The first control word being located in a control queue andincludes identifying a first storage location of the first cryptographicpacket and identifying a second storage location of the required firstset of data. The first cryptographic packet and the required first setof data are transferred to a cryptographic co-processor. The firstcryptographic packet is executed in the cryptographic co-processor. Afield in the first control word is modified to notify the host CPU thatthe execution of the first cryptographic packet is complete. Themodification of the first control word can include identifying a thirdlocation of an execution result of the executed first cryptographicpacket. The host CPU retrieves the execution result from the thirdlocation identified by the first control word.

[0025] Other aspects and advantages of the invention will becomeapparent from the following detailed description, taken in conjunctionwith the accompanying drawings, illustrating by way of example theprinciples of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0026] The present invention will be readily understood by the followingdetailed description in conjunction with the accompanying drawings, andlike reference numerals designate like structural elements.

[0027]FIG. 1A shows an exemplary microprocessor die that includes a CPUand a co-processor.

[0028]FIG. 1B is a flowchart of the method operations for the typicalCPU and cryptographic co-processor to process a cryptographic operationrequest.

[0029]FIG. 1C is a graphical representation of a time line forprocessing the same cryptographic operation request.

[0030]FIG. 2 is a block diagram of a microprocessor die in accordancewith one embodiment of the present invention.

[0031]FIG. 3 is a flowchart of the method operations of the CPU andcryptographic co-processor to process a crypto packet, in accordancewith one embodiment of the present invention.

[0032]FIG. 4 is graphical representation of a time line for processingthe same cryptographic packet.

[0033]FIG. 5 is a graphical representation of an initial control wordthat may be stored in the control queue, in accordance with oneembodiment of the present invention.

[0034]FIG. 6 shows an extension control word in accordance with oneembodiment of the present invention.

[0035]FIG. 7 shows a final control word in accordance with oneembodiment of the present invention.

[0036]FIG. 8A shows a set of non-RC4 registers in accordance with oneembodiment of the present invention.

[0037]FIG. 8B shows a format of Control Register 0 in accordance withone embodiment of the present invention.

[0038]FIG. 8C shows an exemplary format of control register 1 inaccordance with one embodiment of the present invention.

[0039]FIG. 9 shows the RC4 state registers according to one embodimentof the present invention.

[0040]FIG. 10 is a block diagram of the crypto co-processor inaccordance with one embodiment of the present invention.

[0041]FIG. 11A shows an exemplary format of an IPsec packetencapsulating a TCP datagram for transport mode AH, in accordance withone embodiment of the present invention.

[0042]FIG. 11B shows an exemplary format of an IPsec packetencapsulating a TCP datagram for tunnel mode AH, in accordance with oneembodiment of the present invention.

[0043]FIG. 11C shows an exemplary general packet format for transportmode ESP, in accordance with one embodiment of the present invention.

[0044]FIG. 11D shows a more detailed view of the packet format startingat the ESP header in accordance with one embodiment of the presentinvention.

[0045]FIG. 12 shows a general packet format for a tunnel mode ESP,according to one embodiment of the present invention.

[0046]FIG. 13 shows an exemplary packet format of a transport modebundle in accordance with one embodiment of the present invention.

[0047]FIG. 14 shows an exemplary packet (record) format before SSL/TLSin accordance with one embodiment of the present invention.

[0048]FIG. 15 shows an exemplary encrypted packet format for SSL/TLS inaccordance with one embodiment of the present invention.

[0049]FIG. 16 shows an exemplary encrypted packet format with the finalbyte defining the length of the padded data, in accordance with oneembodiment of the present invention.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENTS

[0050] Several exemplary embodiments for an improved system and methodto provide improved communication efficiency between the CPU and thecrypto co-processor will now be described. It will be apparent to thoseskilled in the art that the present invention may be practiced withoutsome or all of the specific details set forth herein.

[0051] One embodiment of an improved communication method between a CPUand a crypto co-processor on the same die includes the CPU providing allof the information required by the crypto co-processor to fully executea crypto operation request when the CPU sends the crypto operationrequest to the crypto co-processor. In this manner, the cryptoco-processor can efficiently execute the crypto operation request withminimum stalls and with a minimal number of communication exchangesbetween the CPU and the crypto co-processor.

[0052]FIG. 2 is a block diagram of a microprocessor die 200 inaccordance with one embodiment of the present invention. The die 200includes a CPU 205 and a crypto coprocessor 250. An interface 215couples the CPU 205 and the crypto coprocessor 250. A control queue 210is also included for providing control information between the CPU 205and the crypto coprocessor 250.

[0053] The CPU can also be coupled to a network interface 220. Thenetwork interface 220 provides data communication between themicroprocessor 200 and other computer systems coupled to themicroprocessor 200. Hence, network interface 220 may be any devicesuitable for or enabling the microprocessor 200 to communicate data witha remote processing system (e.g., a client computer) over a datacommunication link, such as a conventional telephone modem, anIntegrated Services Digital Network (ISDN) adapter, a Digital SubscriberLine (DSL) adapter, a cable modem, a satellite transceiver, an Ethernetadapter, or the like.

[0054] A data packet 222 is shown being received in the networkinterface 220. The network interface can be an Ethernet or any othertype of network interface. The network interface 220 and/or the CPU 205can include software 204 that determine whether an incoming packet 222is an encrypted packet (e.g., IPsec, SSL, TLS, etc.) or a non-encryptedpacket.

[0055] The control queue 210 can be any storage location that is known(e.g., stored within) and accessible by both the CPU 205 and the cryptocoprocessor 250. The control queue 210 can be any memory locationaccessible to both the CPU 205 and the crypto co-processor 250.

[0056] Crypto co-processor 250 can enable higher speed encryption anddecryption processes sufficient to support a data transfer speed of upto about 4 Gb/sec or higher in SSL and IPsec. The crypto co-processor250 also allows crypto processing to overlap with execution of normal(i.e., non-cryptographic) instructions. The crypto co-processor 250 canbe accessed through a set of hardware registers. In one embodiment, thecrypto co-processor can share arithmetic units (e.g., integer multiplierunit, etc.) and memory access units with the main CPU so as to reduceduplicated hardware on the microprocessor die 200. Sharing the memoryaccess units also can allow the crypto co-processor to directly accessthe memory system rather than rely of the host CPU to provide thataccess. The crypto co-processor 250 can also support encrypted datastreams such as DES/3-DES/RC-4/SHA-1/MD-5 at 3-6 clocks/byte/core. Thecrypto co-processor 250 can also support public key (e.g., RSA, integerECC) crypto functions. The crypto co-processor 250 can also supportprocessor intensive functions such as Montgomery multiply,exponentiation, and reduction, thereby freeing CPU resources for otherpurposes.

[0057] The crypto co-processor 250 and CPU 205 can also share varioushardware components (e.g., memory access, arithmetic unit, integermultiplier unit, etc.) as described in commonly owned U.S. patentapplication Ser. No. 10/273,718 filed on Oct. 18, 2002 and entitled“Stream Processor with Cryptographic Co-Processor” by Kohn, which isincorporated by reference herein, in its entirety.

[0058]FIG. 3 is a flowchart of the method operations 300 of the CPU 205and cryptographic co-processor 250 to process a crypto packet, inaccordance with one embodiment of the present invention. FIG. 4 isgraphical representation of a time line 400 for processing the samecryptographic packet. Referring to both FIGS. 3 and 4, at time T₀, andin operation 302, the CPU 205 receives a packet. In operation 304, theCPU 205 identifies the packet as a crypto packet (e.g., IPsec, SSL, TLS,etc.).

[0059] In operation 306, the CPU 205 identifies any additional datarequired to execute the crypto packet. The additional data can includean encryption or decryption key, and data to be encrypted or decryptedor other data required to complete the execution of the crypto packet.

[0060] In operation 310, the CPU 205 identifies the additional data inthe control queue 210. The additional data can be identified in acontrol word 212 as will be described in more detail below.

[0061] In operation 312, the CPU 205 transfers the crypto packet 222 tothe crypto co-processor 250 and stores corresponding control word 212 tocontrol queue 210. Transferring the crypto packet 222 to the cryptoco-processor 250 can include identifying a storage location of thecrypto packet 222 in the control word 212. In operation 314, the CPUreceives and begins processing a subsequent packet.

[0062] At time T₁ and in operation 316, the crypto co-processor 250receives the crypto packet 222 and retrieves the corresponding controlword 212 from control queue 210. In operation 318, the cryptoco-processor 250 processes the crypto packet 222. In processing thecrypto packet 222 the crypto co-processor 250 uses the requiredadditional data identified in the control word 212. At time T₂ and inoperation 322, the crypto co-processor 250 updates the control word toidentify the crypto packet as being completed. Updating the control word212 can include identifying a storage location of the execution resultof the executed crypto packet 222. Alternatively, the Cpu can identifythe location of the execution result of the executed crypto packet 222in the control word 212 before the control word is transferred to thecrypto co-processor 250.

[0063] If, between time T₁ and time T₂, a subsequent packet is processedin the CPU 205 and the subsequent packet is also a crypto packet, thenthe CPU can identify the required additional data for the subsequentcrypto packet and forward the subsequent crypto packet to the cryptoco-processor and also forward a corresponding subsequent control word tothe control queue 210. In this manner, a subsequent crypto packet willnot stall the CPU 205 until the crypto co-processor 250 is available.Further, if such a subsequent crypto packet is forwarded by the CPU 205while the crypto co-processor 250 is busy processing crypto packet 222,then in operation 322, the crypto co-processor can retrieve thecorresponding subsequent control word to immediately process thesubsequent crypto packet upon completion of the previous crypto packet.

[0064] In operation 324, the CPU 205 polls or otherwise reviews thecontrol word 212. The CPU 205 can periodically check the control word.Alternatively, updating the control word in operation 322 can send aninterrupt to the CPU to notify the CPU to check the control word. Whenthe CPU 205 checks the control word 212 and the control word has beenupdated in operation 322 above, then the CPU can retrieve the results ofexecuted crypto packet 222. The method operations 300 can then end forthe packet 222.

[0065] The crypto co-processor 250 and CPU 205 communicate primarilythrough the control word queue 210. Logically, the control word queue210 is a circular, FIFO (first in first out) queue of commands fromsoftware to the crypto co-processor 250, and a circular, FIFO queue ofstatus reports from the crypto co-processor 250 back to CPU 205. Complexcommands may require multiple control words. The control word format canalso provide a mechanism to group multiple control words into controlword blocks for complex commands.

[0066] The control word queue 210 is implemented in a region incacheable memory. CPU 205 allocates a region in memory for the controlword queue 210. The crypto co-processor 250 and CPU can also use amultiple register interface to manage the control word queue 210. In oneembodiment, four hardware registers are used and can be identified asthe First, Last, Head, and Tail registers. The First register holds theaddress of the first (lowest) control word queue entry. Similarly, theLast register holds the address of the last (highest) control word queueentry. The Head register holds the address of the next control wordblock for the crypto co-processor 250 to process. The Tail registerholds the address of the next control word to be written by software.CPU 205 wraps the Head pointer and software wraps the Tail pointer toFirst when either pointer passes Last. The queue is full if Head−Tail==1or if Head==First and Tail==Last; only Last-First entries can be in useat a time.

[0067] At reset, all four registers should initialize to zero. Thecontrol word queue 210 is in the uninitialized state while the Tailregister is zero. During a boot sequence, the CPU 205 should allocatespace for the control word queue and should set the First and Lastregisters to indicate the region allocated. At this time, the CPU 205should set the Head and Tail registers to be equal to the Firstregister. The control word queue 210 transitions to the idle state(since the Head and Tail registers are equal and the Tail is nonzero).

[0068] At some later time, the CPU 205 writes the first command into thecontrol word queue 210 starting at the address in the Tail register.Once the CPU 205 has written a complete command to the control wordqueue 210, it updates the Tail register to point to the next unusedcontrol word. The CPU 205 may also write multiple commands to thecontrol word queue 210 at a time before updating the Tail pointer.However, this will tend to limit overlap time between the cryptoco-processor 250 and the CPU 205. The crypto co-processor 250 detectsthat the Head and Tail registers are no longer equal, which indicatesthe control word queue 210 has transitioned into the active state. Thecrypto co-processor 250 can begin fetching control words, interpret thefetched control words, and begin processing the commands containedwithin the control words.

[0069] Once the crypto co-processor 250 has completed a control wordblock (which contains one or more control words), the cryptoco-processor records status information into the leading control word ofthe control word block. At this time, the crypto co-processor 250 alsoupdates the Head register to point to the control word after the last inthe group. As long as the Head and Tail registers are not equal, thecontrol word queue 210 remains active and the crypto co-processor 250continues processing control words.

[0070] Software 204 can monitor the Head register. When the Headregister changes, software 204 can then read the status for thecompleted group from the control word queue 210.

[0071] In an alternate control register scheme, the control word queue210 could be managed with 3 hardware registers: a Size register, a Headregister, and a Tail register. This control register scheme requiresthat the control word queue 210 have 2^(N) entries, where N is a nonzeropositive integer, and the control word queue be aligned to a 2^(N+3)byte boundary. In this scheme, size would hold the value of N. Thecrypto co-processor 250 and software would derive the addresses of thefirst and last addresses by:

First=Head & ˜(2^(N+3)−1) Last=First|(2^(N+3)−1)

[0072] All other aspects of the control word queue 210 operation wouldbe substantially similar to that described above using four hardwareregisters.

[0073] As described above, multiple control words can be grouped into asingle control word block. Using a control word block, the CPU 205 canspecify the source packet as a collection of buffers in memory. The CPU205 can specify a source address and length for each buffer, and canchain many buffers together to specify a complete source packet.

[0074] The control word block provides a high level interface for IPsecand SSL processing, while still supporting software access to theauthentication and encryption algorithms. For outbound IPsec, thesoftware 204 constructs a prototype packet in memory that contains allthe data and IP header information needed to construct the final IPsecpacket. For outbound SSL, the software 204 provides pointers to thepseudoheader concatenated with the application data. In both cases,software 204 provides one or more destination addresses, which point tomemory sufficient to hold the output datagram. Inbound processing issimilar, software 204 provides the crypto co-processor 250 with theinbound packet and a destination buffer large enough to hold thedecoded, authenticated datagram. In addition to the data, the cryptoco-processor 250 also needs pointers to the keys and initializationvectors (IV) used in authentication and encryption. These pointers tothe keys and IVs can be included in the control word 212.

[0075] In one embodiment, control words have four basic formats: Initialcontrol words, extension control words, final control words, andcomplete control words. In alternative embodiments, fewer or more thanfour control word formats may be used. The following exemplary fourtypes of control words are provided for exemplary purposes only andshould not limit the scope of the invention to only the described typesof control words. As shown in Table 1A below, the composition of thecontrol word block depends on the number of source packet fragments.TABLE 1A Number of Use Use Number of Use Source Complete InitialExtension Final Packet Control Control Control Control Fragments Word?Word? Words Word? 1 Yes No 0 No 2 No Yes 0 Yes 3 No Yes 1 Yes  4+ No YesNumber of Yes fragments minus 2

[0076] The number entries in the control word queue 210 used by eachtype of control word is shown in Table 1B below. In one embodiment, eachcontrol word queue entry is 64 bits (8 bytes). In alternativeembodiments, each control word queue entry could be larger or smallerthan 64 bits. TABLE 1B Control Word Type Number of Control Word QueueEntries Initial Control Word 10 Extension Control Word 2 Final ControlWord 2 Complete Control Word 10

[0077] An initial control word is the first control word in a controlword block that has multiple control words. The CPU 205 can use multiplecontrol words if the source packet is stored in more than one memorylocation. FIG. 5 is a graphical representation of an initial controlword 212 that may be stored in the control queue 210, in accordance withone embodiment of the present invention.

[0078] The most significant bit of the operation field can be set to ‘1’to indicate an in-place operation or alternatively can be set to ‘0’ toindicate a copy operation. An in-place operation implies that thedestination address field is ignored. The remaining 7 bits indicate whathigh level operation the crypto co-processor 250 should perform for thiscontrol word block. Table 1C below documents the supported values forthis field, according to an exemplary embodiment of the presentinvention. Values 64 and above indicate that the hardware does notinspect the packet to determine fields for encryption and decryption;instead, the operations to be performed are specified directly in theencryption type and authentication type fields based upon the opcode.TABLE 1C Operation Field bits 1:7 Value Meaning  0 Treat source data asIPv4 packet for IPsec transport processing  1 Treat source data as IPv6packet for IPsec transport processing  2 Treat source data as IPv4packet for IPsec tunnel processing  3 Treat source data as IPv6 packetfor IPsec tunnel processing  4 Treat source data as SSL 3.0 data block 5 Treat source data as TLS 1.0 data block 6-7 Reserved  8 Treat sourcedata as MPA (marker PDU aligned) ULPDU (upper-level protocol data unit)for outbound, or MPA FPDU (frame protocol data unit) for inboundprocessing. The first 32-bit word of the source data MUST be the firstword of the ULPDU (outbound) or FPDU (inbound). Hardware will use themost significant 2 bytes of this 32-bit word as the ULPDU length forfurther processing.  9-15 Reserved 16 Treat source data as SSLpre-master data block for master secret generation. The source data mustconsist of the pre-master secret, followed by the ClientHello.Random andServerHello.Random values. 17 Treat source data as SSL master secret anduse the Length field to generate the SSL keyblock. Hardware ignores allfields except the Opcode, Length, Source Address, and DestinationAddress fields. 18 Treat source data as the SSL pre-master datablock anduse the Length field to generate both the master secret and the keyblock. This operation combines opcodes 16 and 17. 19 Reserved 20 Treatsource data as TLS pre-master data block for master key generation. Thesource data must consist of the pre-master secret, followed by theClientHello.Random and ServerHello.Random values. 21 Treat source dataas TLS master secret; use the Length field to generate the TLS keyblock.22 Treat source data as TLS pre-master datablock. Use the Length fieldto generate both the master secret and the TLS keyblock immediatelyfollowing the master secret. This opcode combines opcodes 20 and 21.23-31 Reserved 32 For outbound packets, treat the source data as an IPv4header. Parse the IP header, generate the header checksum, and place thechecksum in the appropriate field in the header. For inbound packets,treat the source data as an IPv4 header, and validate the headerchecksum. The computed checksum will be placed at the destinationlocation for a copy operation; for an in-place operation, the checksumwill only be validated. 33 For outbound packets, treat the source dataas an IPv4 or IPv6 header encapsulating TCP or UDP data. For an outboundpacket, hardware will parse the IPv4 and TCP/UDP headers and computeboth the TCP/UDP payload checksum and the IPv4 header checksum. Thechecksums will be written at the appropriate offset into thedestination. For inbound packets, hardware will compute and validateboth the IPv4 header and TCP/UDP payload checksums. For IPv6, no headerchecksum will be computed or validated. 34-40 Reserved 41 For outboundpackets, treat the source data as an IPv4 or IPv6 header encapsulatingTCP data, which in turn encapsulates an MPA ULPDU. Compute and addpadding bytes, markers, and generate the CRC32c to form the MPA FPDU.Then, generate and store both the TCP checksum and the IPv4 headerchecksum. The CPU 205 will parse the IP header and TCP header todetermine the length of the TCP and IP headers. It will then read thesize of the ULPDU from the appropriate offset from the TCP headers. Forinbound data, treat the source as an IPv4 or IPv6 header encapsulatingTCP data, which in turn encapsulates an MPA FPDU. The CPU 205 willvalidate the IPv4 and TCP checksums, then validate the CRC32c andmarkers. Assuming the checksums and CRC/marker validation succeed, theCPU 205 will remove the padding bytes, markers, and CRC32c, leaving anIP-encapsulated, TCP- encapsulated, MPA ULPDU at the destination. TheCPU 205 will not alter the IP length field. For an outbound packet,software must calculate the size increase due to padding bytes, markers,and the CRC32c, and adjust the TCP sequence number for the next packetand IP length before transmitting the packet. Similarly, on an inboundpacket, software must derive the number of padding, marker, and CRC32cbytes from the IP length field and the header lengths. The CPU 205 willperform one pass over the source data for both inbound and outboundoperations, thereby minimizing fetch and store bandwidth. 42-63 Reserved64 Only perform the cipher as specified in the Encryption Type field,starting at the source address for the number of bytes specified in theLength field. The software 204 must ensure that the length is a multipleof the block cipher length; hardware will set an error bit in theHardware Status Field if this is not the case. The entire length will beprocessed. For an in-place operation, the source data will beoverwritten with the ciphertext. For a copy operation, the ciphertext iswritten to memory starting at the destination address. 65 Only performthe authentication MAC (message authentication code) as specified in theAuthentication type field starting at the source address for the numberof bytes specified in length. The crypto co-processor 250 willimplicitly pad the hash length as required by the hash algorithm. Thecrypto co-processor 250 will initialize the hash state from memorystarting at the address contained in the Authentication IV field. ForHMACs, (Hashed Message Authentication Codes) the crypto co-processor 250will generate the hash key from memory starting at the address containedin the Authentication Key Address field. For an in-place operation, thehash result is written to memory at the address specified in theDestination Address field. The number of bytes written is specified bythe contents of the Hash Length field. The source data is not copied tothe destination. For a copy operation, the source data is copied to theaddress specified by the Destination Address field. The hash result iswritten to memory immediately following the copied source data. Thelength of the hash result is specified by the Hash Length field. Foreither an in-place or a copy operation, the final state of the hash willbe written to memory starting at the address specified by the contentsof the Final Authentication State Address field. The length of the hashstate is specified implicitly by the algorithm (MD5 —16 bytes; SHA-1 —20bytes; SHA-256 —32 bytes). 66 Perform the cipher specified in theEncryption Type field, then perform the authentication specified by theAuthentication Type field. The cipher operation starts at the byte namedby the Source Address field. The length of the cipher is specified bythe length field and must be a multiple of the block cipher length. Thesource address of the hash is computed by adding the 2nd OperationOffset field to the Source Address field. The length of the field tohash is specified by the 2nd Operation Length field. The length of thehash is specified by the Hash Length field. The IV for the hash isfetched from memory using the address specified in the Authentication IVAddress field, and, if required, the hash key is fetched from the memoryaddress specified by the contents of the Authentication Key Addressfield. The hash will be written to memory immediately following the lastbyte to be authenticated. The complete hash state will be written tomemory starting at the address specified by the contents of the FinalAuthentication State Address field. This operation can be used toproduce the encrypt, then hash for an outbound IPsec ESP with AH packet,or perform the decryption, then hash for an inbound SSL/TLS packet. 67Perform the hash specified in the Authentication Type field, thenperform the cipher specified in the Encryption Type field. The field tobe hashed starts at the source address and the length of the field to behashed hash cipher is specified in the length field. The number of bytesin the computed hash is specified in the Hash Length field. The IV forthe hash is fetched from memory using the address specified in theAuthentication IV Address field, and, if required, the hash key isfetched from the memory address specified by the contents of theAuthentication Key Address field. The hash will be written to memoryimmediately following the last byte of the field being hashed. Thecomplete hash state will be written to memory starting at the addressspecified by the contents of the Final Authentication State Addressfield. The field to be encrypted is specified with the 2nd OperationOffset field by adding this offset to the Source Address field. Thelength of the data to be encrypted or decrypted is specified by the 2ndOperation Length field. This operation can be used to produce the hash,then encrypt operation for an outbound SSL/TLS packet, or hash, thendecrypt operation for an inbound IPsec ESP with AH packet. 68-69Reserved 70 This operation is substantially similar to opcode 66, exceptthat the data field is encrypted in-place, regardless of the setting ofopcode bit 0 (copy/in-place). The computed hash result is written to theaddress specified by the Destination Address field, instead ofimmediately following the field to be hashed.. The complete hash stateis written to memory starting at the address specified by the contentsof the Final Authentication State field. 71 This operation issubstantially similar to opcode 67, except that the data field isencrypted in-place, regardless of the setting of opcode bit 0(copy/in-place). The computed hash result is written to the addressspecified by the Destination Address field, instead of immediatelyfollowing the field to be hashed. The complete hash state is written tomemory starting at the address specified by the contents of the FinalAuthentication State field. 72 Reserved 73 This operation issubstantially similar to opcode 65, except that no data is copied,regardless of the setting of opcode bit 0 (copy/in-place). Instead, anexpected hash value is stored at the address specified by theDestination Address field. The crypto co-processor 250 computes thehash, compares it to the expected hash, then sets the AuthenticationFailbit of the Hardware Status field appropriately, then writes the computedhash to the location specified in the Destination Address field(overwriting the expected hash value). The complete hash state iswritten to memory starting at the address specified by the contents ofthe Final Authentication State field. 74 This operation is substantiallysimilar to opcode 66, except that the data field is encrypted in-place,regardless of the setting of opcode bit 0 (copy/in-place). Instead, anexpected hash value is stored at the address specified by theDestination Address field. The crypto co-processor 250computes the hash,compares it to the expected hash, then sets the AuthenticationFail bitof the Hardware Status field appropriately, then writes the computedhash to the location specified in the Destination Address field(overwriting the expected hash value). The complete hash state iswritten to memory starting at the address specified by the contents ofthe Final Authentication State field. 75 This operation is substantiallysimilar to opcode 67, except that the data field is encrypted in-place,regardless of the setting of opcode bit 0 (copy/in-place). Instead, anexpected hash value is stored at the address specified by theDestination Address field. Hardware computes the hash, compares it tothe expected hash, then sets the AuthenticationFail bit of the HardwareStatus field appropriately, then writes the computed hash to thelocation specified in the Destination Address field (overwriting theexpected hash value). The complete hash state is written to memorystarting at the address specified by the contents of the FinalAuthentication State field.  76-127 Reserved

[0079] The Dir (Direction) bit indicates whether the datagram is inbound(Dir=1) or outbound (Dir=0). The Dir bit controls whether authenticationgenerates or checks the MAC, and whether the packet is encrypted ordecrypted. Inbound packets are decrypted and the MAC is checked;outbound packets are encrypted and the MAC is generated. The Directionfield is only significant for valid opcodes in ranges 0-15 and 32-63. Itis ignored for all other opcodes.

[0080] The SoB (Start of Buffer) bit is set to 1 for the initial controlword. It indicates the first control word of a control word block. TheSoB bit is also set to 1 for a complete control word.

[0081] The EoB (End of Buffer) bit is set to 0 for the initial controlword. The EoB bit is set to 1 for a complete control word or a finalcontrol word.

[0082] The Int (Interrupt) bit causes the crypto co-processor 250 tointerrupt the thread (Core) specified in the Core ID field uponcompletion of this control word block.

[0083] The Core ID specifies which thread within the CPU 205 shouldreceive an interrupt if the Int bit is set.

[0084] The Authentication Type field specifies what algorithm to use forauthentication. Table 2 below documents the meaning of the bits in thisfield according to an exemplary embodiment of the present invention.TABLE 2 Authentication Type Field Bit Position Name Description 0 ValidIf one, perform authentication. If zero, do not perform authentication.1 SHA-1 If one, perform SHA-1 authentication. 2 SHA-256 If one, performSHA-256 authentication. 3 MD5 If one, perform MD5 authentication. 4HMAC_SHA-1 If one, perform the HMAC using SHA-1. 5 HMAC_SHA-256 If one,perform the HMAC using SHA-256. 6 HMAC_MD5 If one, perform the HMACusing MD5. 7 CRC32c If one, perform the iSCSI/RDMA CRC32c computationusing the polynomial 0x11EDC6F41.

[0085] Only one of bits 1 to 7 can be set. For an outbound packet, thecrypto co-processor 250 will write the computed hash value at theappropriate offset in the destination packet. For an inbound packetcomputed in-place, crypto co-processor 250 will generate and check thehash against the incoming hash value and set the hardware status flagappropriately. For an inbound packet processed with a copy operation,the crypto co-processor 250 will compute the hash and store it at theappropriate offset in the destination packet, in addition to comparingthe hash and setting the status flag. In the event of an authenticationfailure, software 204 can then inspect the original and computed hashvalues.

[0086] The Encryption Type field specifies what algorithm to use forencryption. Table 3 below documents the meaning of the bits in thisfield. The key schedules can be generated by hardware for those ciphersrequiring key generation (AES). Alternatively, the key schedule can beloaded from memory. TABLE 3 Encryption Type Field Bit Position NameDescription 0 Vaild If one, perform encryption. If zero, do not performencryption. 1 IV Valid If one, use the initialization vector at theEncryption Initialization Vector Address. If zero, use an initializationvector of all zeroes. 2:5 Algorithm Specifies the encryption algorithmto use 0000 - DES 0001 - Triple DES 0010 - RC4 0011 - Reserved 0100 -AES 128 0101 - AES 192 0110 - AES 256 0111 - Reserved 1000-1011 Reserved1100 - AES 128 counter mode 1101 - AES 192 counter mode 1110 - AES 256counter mode 1111 - Reserved 6:7 Chaining Specifies the type ofchaining: 00 - ECB 01 - CBC

[0087] The Length field specifies the length (in bytes) of the firstsource data block. The first data block starts at the source addressspecified in the initial control word.

[0088] The HW Status field is used, by the crypto co-processor 250, toreturn error status about the authentication and encryption operation tothe software 204. If no status bits are set, the operation completedsuccessfully. Table 4 provides additional details of the HW Statusfield. TABLE 4 HW Status Field Bit Position Name Description 0AuthenticFail This bit is set by the crypto co-processor 250 ifauthentication fails for inbound packets. 1 IPv4 checksum The IPv4header checksum for an inbound packet failed validation. failure 2TCP/UDP The TCP or UDP payload checksum failed validation. payloadchecksum failure 3 MPA marker At least one of the markers in the MPAFPDU failed to match the expected failure offset from the start of theFPDU. 4 MPA CRC32c The computed CRC32c in an MPA FPDU failed to matchthe expected failure value. 5 EncryptLenFail The encryption length isnot a multiple of the cipher block size. 6 UncorrectableHardware Ahardware error occurred which the hardware could not correct. The majorError cause of a hardware error is an uncorrectable data error whenfetching data from memory or reading it from a register. 7 Reserved

[0089] The Authentication Key Address field holds the physical addressof the key to use for authentication.

[0090] The Authentication IV (initialization vector) Address field holdsthe physical address of the IV for authentication. The appropriatenumber of bytes starting at this address (e.g., MD5-16; SHA-1-20;SHA-256-32) are used to load the authentication state.

[0091] The Final Authentication State Address field holds the physicaladdress where the complete contents of the authentication state will bewritten when the authentication operation is complete. The appropriatenumber of bytes (e.g., MD5-16; SHA-1-20; SHA-256-32) are written tomemory at this address.

[0092] The Encryption Key Address field holds the physical address ofthe key (or key schedule) to use for encryption.

[0093] The Encryption Initialization Vector Address field holds thephysical address of the initialization vector for encryption. In theevent that the initial control word specifies that encryption should notbe performed or that the initialization vector should be zeroes, thisfield is ignored by the crypto co-processor 250. For SSL or TLSencryption operation, this pointer may also point to a 32-bit sequencenumber (that is, the sequence number may be stored after the IV inmemory).

[0094] The Source Address field specifies the physical address locationof the first segment of the source packet or data. From this address,the crypto co-processor 250 will process the number of bytes specifiedin the Length field.

[0095] The Destination Address field specifies the physical addresswhere the crypto co-processor 250 should write its results. Software 204must allocate space for the crypto co-processor 250 result (and ensurethat the crypto co-processor 250 will not overwrite other data). Forsome crypto co-processor 250 operations (e.g., opcodes 73-75), thisfield specifies the location of a hash result that the hardware compareswith its generated result.

[0096] The Hash Length field specifies the length of the hash value inbytes that the crypto co-processor 250 will generate and/or compare. Thevalue in hash length field is one less than the actual hash lengthcomputed or compared (e.g., 255 means a 256 B hash will be computed orcompared).

[0097] The 2nd Operation Offset field specifies the offset from thesource address for the second of two operations specified by opcodes 66and 67. The offset is a signed, two's-complement 16-bit number that isadded to the Source Address to specify the starting address for thesecond operation. This field is ignored for operations other than thosespecified by opcodes 66 and 67.

[0098] The 2nd Operation Length field specifies the length (in bytes) ofthe second operation for operations specified by opcodes 66 and 67. Thisfield is ignored for operations specified by other opcodes.

[0099]FIG. 6 shows an extension control word 600 in accordance with oneembodiment of the present invention. The extension control word 600specifies the location of the second or subsequent source packetfragments. The Length and Source Address fields are used in the samemanner as in the initial control word.

[0100]FIG. 7 shows a final control word 700 in accordance with oneembodiment of the present invention. The final control word 700specifies the location of the last source packet fragment. The Lengthand Source Address fields are used in the same manner as in the initialcontrol word 500. The End of Block field must be set to one.

[0101] The complete control word specifies a command for a source packetthat is stored in one location. It is substantially similar to theinitial control word 500, the major difference is that the End of Bufferfield is set to one. The complete control word is a control word blockthat contains a single control word. The fields included in the completecontrol word are the same as described in the initial control word 500.

[0102] The crypto co-processor 250 can also be directly accessed via theinterface 215. In one embodiment, the interface 215 is a set of hardwareregisters. The interface 215 allows direct control of cryptoco-processor 250 operations without using the control word queue 210.The interface 215 also allows taking a supervisor trap. By way ofexample, the interface 215 can be useful to support “short” cryptooperations, such as encryption of small XML blocks.

[0103] The interface 215 can also use a set of per-thread hardwareregisters 216 (so each register is replicated 8 times per CPU 205).These registers 216 contain data and commands to instruct the cryptoco-processor 250 to perform an operation, as well as containing resultsfrom these operations. A bit is defined to enable lazy save/restore ofthese registers. Each register 216 includes the following informationfor the crypto co-processor 250: Length, Operation Type, Source Data andKeys and/or initialization vectors.

[0104] After performing the operation, the crypto co-processor 250 willreturn the result via the result registers. In one embodiment, thesupported hashes have a block size of 512 bits, and the ciphers haveblock sizes of one byte (RC4), 128 bits (AES), and 64 bits (DES),therefore at least 512 bits must be provided in registers to contain thesource data. At least 256 bits must be provided in result registers,since the hash lengths are 128 (MD5), 160 (SHA-1), and 256 (SHA-256),with cipher outputs as mentioned above. For AES, space for a 256-bit keyis required, and a 128-bit initialization vector (IV) may be required.Additionally, for RC4, a 258-byte state initialization is required (256B state matrix+2 1 B indices). A separate, threaded set of 33 hardwareregisters can be used to store the RC4 state and indices.

[0105]FIG. 8A shows a set of non-RC4 registers in accordance with oneembodiment of the present invention. The non-RC4 registers are definedas follows, with each register definition occupying 2 rows of the FIG.8A. Register 1 is the control register and must be written last.Registers 2-9 contain source data. Registers 10-13 contain cipher keydata. Registers 14-15 contain cipher initialization vector data oninput. For AES counter mode, these registers contain the{nonce[31:0]∥IV[63:0]∥counter[31:0]}. The crypto co-processor 250 willinitialize the counter block with this value and increment the counterfor subsequent blocks. This data is used by the cipher operation unlessECB mode is specified. Registers 16-19 contain the authentication keyfor HMAC operations. Registers 20-27 contain the ciphertext. Registers28-31 contain the hash initialization vector (on input), and the hashresult on output.

[0106] Control register 0 controls access to the crypto co-processor250. It is a privileged, per strand hardware register. FIG. 8B shows aformat of Control Register 0 in accordance with one embodiment of thepresent invention. The register 0 includes an enabled bit. This enabledbit controls access to the crypto co-processor 250 between processesrunning on the same thread. If enabled is set, any read or write tocontrol register 1, the source data registers, key data registers, andresult data/initialization vector registers is permitted and performed.If enabled is not set, any read or write to these registers causes aprecise trap. Accesses to control register 0 (by privileged code) arealways allowed. As part of a context switch, software 204 should resetthe enabled bit. If another process accesses the facility, the resultingtrap can be used to save the registers. The trap handler should set theenabled bit (to grant itself access to the registers), then save cryptostate. The trap handler must not be interrupted. The handler candetermine if the registers must be saved. After the registers have beensaved, control can be transferred to the interrupted process.

[0107] The software 204 can also ensure that the crypto co-processor 250is idle (therefore having completed the previous operation) beforeattempting to save the state from the previous process. Some operationsare not restartable since the initialization vector registers are sharedwith the result registers. An exemplary prototype trap handlerorganization is:

[0108] Set control register 0 enabled bit

[0109] Read control register 1 using synchronous stalling load (the loadwill wait for operation to complete—Busy bit will be 0).

[0110] Save control registers (other crypto state, such as RC4 state,may also have to be saved).

[0111] Re-enable the process that caused the trap.

[0112]FIG. 8C shows an exemplary format of control register 1 inaccordance with one embodiment of the present invention. The operationfield depicts the operation to be performed. The Control register 1 mustbe the last register to be written before starting an operation (i.e.,all other registers must be written first). When an operation is startedby writing to the control register, the crypto co-processor 250 willarbitrate among the control word queue operations that are pending aswell as any other register operations from other threads before startingthis operation. The crypto co-processor 250 will set the status fieldwhen either the operation is complete or an error has been detected.Fields in the control register I are described below.

[0113] Operation—The operation field encodes the operation to beperformed as follows. For cipher operations, the Direction fieldspecifies whether encryption or decryption is performed. The operationsfield codes are similar to their control word counterparts but directIPsec or SSL/TLS decoding functions are not supported. The followingTable 6 shows the operation field code definitions according to oneembodiment of the present invention. TABLE 6 Operation Field bits 0:7Value Meaning 0-63 Reserved 64 Only perform the cipher as specified inthe Encryption Type field starting at the most significant byte ofSource Data Register 0 + the number of bytes specified in the EncryptionOffset field, and encrypt the number of bytes specified by theEncryption Length field. The encrypted (or decrypted) result will bewritten to the Cipher Result registers, at the same correspondingoffset. Software 204 must ensure that the Encryption Length is amultiple of the block cipher length. 65 Only perform the authenticationas specified in the Authentication Type field starting at the mostsignificant byte of Source Data Register 0 + the number of bytesspecified in the Hash Offset field, for the number of bytes specified inthe Hash Length field. The crypto co-processor 250 will implicitly padthe hash length as required by the hash algorithm. The hash will bewritten to the Hash Result registers. Unused bytes within a Hash Resultregister will be filled with zeroes (for SHA-1, only the leastsignificant 4 bytes of Hash Result Register 2 will be filled, and theremaining bytes will be zeroed, and Hash Result Register 3 will not beaffected). 66 Perform the cipher specified in the Encryption Type field,then perform the authentication specified by the Authentication Typefield. The cipher operation starts at the most significant byte ofSource Data Register 0 + the Encryption Offset. The length of the cipheris specified by the Cipher Length field and must be a multiple of theblock cipher length. The cipher data will be written to the CipherResult registers starting at the corresponding byte location of thesource; unused result registers will not be affected. The source addressof the hash is computed by adding the Hash Offset field to the locationof the most significant byte of Source Data Register 0. The length ofthe field to hash is specified by the Hash Length field. The hash willbe written to the Hash Result registers as specified in Operation 65above. This operation can be used to produce the encrypt, then hash foran outbound IPsec ESP with AH packet, or perform the decryption, thenhash for an inbound SSL/TLS packet. 67 Perform the hash specified in theAuthentication Type field, then perform the cipher specified in theEncryption Type field. The field to be hashed starts at the mostsignificant byte of Source Data Register 0 + the contents of the HashOffset field; the length of the hash is specified in the Hash Lengthfield. The cipher starts at the most significant byte of Source DataRegister 0 + the Encryption Offset field; the length of the cipher isspecified in the Encryption Length field. The hash will be written tothe Hash Result registers as in Operation 65 above. The ciphertext willbe written to the Cipher Data registers as specified in Operation 64above. This operation can be used to produce the hash, then encryptoperation for an outbound SSL/TLS packet, or hash, then decryptoperation for an inbound IPsec ESP with AH packet. 68-255 Reserved

[0114] Dir(Direction) bit—the direction bit is specifies the directionof cipher operation for AES and DES/3DES. If set, encryption isperformed; if reset, decryption is performed.

[0115] Authentication Type—This field specifies what algorithm to usefor authentication. The Table 7 below documents the meaning of the bitsin this field according to one embodiment. Only one of bits 1 to 7 aretypically set at a time. TABLE 7 Authentication Type Field Bit PositionName Description 0 Valid If one, perform authentication. If zero, do notperform authentication. 1 SHA-1 If one, perform SHA-1 authentication. 2SHA-256 If one, perform SHA-256 authentication. 3 MD5 If one, performMD5 authentication. 4 HMAC_SHA-1 If one, perform the HMAC using SHA-1. 5HMAC_SHA- If one, perform the HMAC using SHA-256. 256 6 HMAC_MD5 If one,perform the HMAC using MD5. 7 CRC32c If one, perform the iSCSI/RDMACRC32c computation using the polynomial 0x11EDC6F41.

[0116] The encryption type field specifies what algorithm to use forencryption. Table 8 below documents the meaning of the bits in thisfield according to one embodiment. The key schedules can be generated bythe crypto co-processor 250 for those ciphers requiring key generation(AES). Alternatively, a key schedule can also be loaded from memory.TABLE 8 Encryption Type Field Bit Position Name Description 0 Valid Ifone, perform encryption. If zero, do not perform encryption. 1 IV ValidIf one, use the initialization vector in the Cipher InitializationVector registers. If zero, use an nitialization vector of all zeroes.2:5 Algorithm Specifies the encryption algorithm to use 0000 - DES0001 - Triple DES 0010 - RC4 011 - Reserved 0100 - AES 128 0101 - AES192 0110 - AES 256 0111 - Reserved 1000-1011 Reserved 1100 - AES 128counter mode 1101 - AES 192 counter mode 1110 - AES 256 counter mode1111 - Reserved 6:7 Chaining Specifies the type of chaining: 00 - ECB01 - CBC

[0117] The HW status field contains status pertaining to the current (orlast) co-processor 250 operation. Table 9 defines the HW status fieldcontents according to one embodiment. TABLE 9 HW Status Field BitPosition Name Description 0 Busy/NotIdle This bit is set upon writingControl Register 1 with a valid operation. It is reset when the cryptounit completes the specified operation. Software 204 may poll this bitto determine when the crypto operation is complete. Alternatively, aload of this register using a register will stall until the Busy/NotIdlebit is reset, to allow software to wait synchronously. 1CipherLengthError This bit is set when the crypto co-processor 250detects an invalid length for the specified cipher operation. 2HashLengthError This bit is set when hardware detects a non-zero HashOffset field with a 64 byte Hash Length field. 3 UncorrectableHardware Ahardware error occurred which the crypto co-processor 250 could notError correct. The major cause of a hardware error is an uncorrectabledata error when fetching data from memory or reading it from a register4-7 Reserved

[0118] The en (encryption) length field specifies the length of cipheroperations. Table 10 describes the en length field contents according toone embodiment. Only certain values are permitted for a given operation.The crypto co-processor 250 may check the length and signal an invalidlength error if these rules are violated. The value in the en lengthfield is one less than the cipher length (e.g., 0×0f is stored for a 16byte operation). TABLE 10 Operation Permissible Length field values(inbytes) AES-128 {16, 32, 48, 64} − 1 AES-192 {16, 32, 48, 64} − 1 AES-256{16, 32, 48, 64} − 1 DES {8, 16, 24, 32, 40, 48, 56, 64} − 1 3DES {8,16, 24, 32, 40, 48, 56, 64} − 1 RC4 {1 . . . 64} − 1

[0119] The ha (hash) length field specifies the length of hashoperations. Table 11 describes the ha length field contents according toone embodiment. If the hash length is not 64, the crypto co-processor250 will pad the length as required by the hash algorithm prior toperforming the hash. The value in the ha length field is one less thanthe hash data length (e.g., 0×0f is stored for a 16 byte operation).TABLE 11 Operation Permissible Length field values(in bytes) MD5 {1 . .. 64} − 1 SHA-1 {1 . . . 64} − 1 SHA-256 {1 . . . 64} − 1 HMAC_MD5 {1 .. . 64} − 1 HMAC_SHA-1 {1 . . . 64} − 1 HMAC_SHA-256 {1 . . . 64} − 1

[0120] The en (encryption) offset is a positive offset in bytes from thehigh-order byte of source data register 0 where the encryption operationwill start. The crypto co-processor can support an arbitrary bytealignment. The crypto co-processor 250 can also signal a length error ifthe (encryption offset+encryption length)>64. The en offset field may beused, for example, when processing the first block of an outbound IPsecESP mode packet with AH. In this case the encryption offset is 8 bytesfrom the start of the data field, to allow the software 204 to load thedata registers starting with the AH.

[0121] The ha (hash) offset is a positive offset in bytes from thehigh-order byte of source data register 0 where the hash operation willstart. The crypto co-processor 250 can support an arbitrary bytealignment. For a 64-byte hash, the ha offset field should be 0. Thecrypto co-processor 250 can also signal an invalid hash length errorotherwise. In practice, only one of the en offset or ha offset fieldswould be nonzero.

[0122] The following Table 12 describes how to fill in the registers foreach operation type that is supported, as well as limitations on theoperation type, for one embodiment. TABLE 12 Control InitializationOperation Register Source Data Key Data Vector Result Restrictions AESFill in 2, 4, 6, or 8 2, 3, or 4 64- 2 64-bit words Same as sourceSource data encryption/ Operation, 64-bit words bit words or all zeroesif length can only be 2, decryption Direction, based upon unused 4, 6,or 8 64- Encryption key size bit words Type, Encryption Offset, andEncryption Length (length must be a multiple of 16 bytes) DES Fill in 1to 8 64-bit 1 64-bit word 1 64-bit word 1 to 8 64-bit Source dataencryption/ Operation, words with low-order or all zeroes if words asmust be 1 to 8 decryption Direction, bit of each unused source length64-bit words in Encryption byte odd parity length Type, EncryptionOffset, and Encryption Length (length must be a multiple of 8 bytes)3DES Same as DES Same as DES 3 64-bit words Same as DES Same as DES Sameas DES encryption/ with low-order decryption bit of each byte odd parityRC4 Fill in Any number of — — Same as source RC4 state Operation, bytesfrom 1 to length matrix must be Direction, 64 preloaded via Encryption33 64-bit Type, register writes Encryption prior to Offset, performingthis Encryption operation Length, and State Registers MD5 Fill in Anynumber of Ignored 2 64-bit words 2 64-bit words Hardware will Operation,bytes from 1 to as necessary pad input data Authentication 64 (or allzeroes) Type, Hash Offset, and Hash length SHA-1 Fill in Any number ofIgnored 2.5 64-bit 2.5 64-bit Hardware will Operation, bytes from 1 towords (160 words (160 pad input data Authentication 64 bits) as bits)Type, Hash necessary, or Offset, and all zeroes Hash length SHA-256 Fillin Any number of Ignored 4 64-bit words 4 64-bit words Hardware willOperation, bytes from 1 to as necessary, pad input data Authentication64 or all zeroes Type, Hash Offset, and Hash length HMAC_MD5 Fill in Anynumber of Fill in first 2 2 64-bit words 2 64-bit words Hardware willOperation, bytes from 1 to 64-bit key as necessary pad input dataAuthentication 64 words with (or all zeroes) Type, Hash 128-bit keyOffset, and Hash length HMAC_SHA-1 Fill in Any number of Fill in first2.5 2.5 64-bit 2.5 64-bit Hardware will Operation, bytes from 1 to64-bit words words (160 words (160 pad input data Authentication 64 (160bits) with bits) as bits) Type, Hash key necessary, or Offset, and allzeroes Hash length HMAC_SHA- Fill in Any number of Fill in 4 64-bit 464-bit words 4 64-bit words Hardware will 256 Operation, bytes from 1 towords with as necessary, pad input data Authentication 64 256-bit key orall zeroes Type, Hash Offset, and Hash length Encrypt then Fill in Anynumber of As required As required As specified Cipher Length Hash orHash Operation, bytes for hash; must be a then Encrypt Authenticationcipher length multiple of Type, Hash must be a block cipher Offset, Hashmultiple of length Length, block cipher Encryption length Type, Dir,Encryption Offset, and Encryption Length

[0123]FIG. 9 shows the RC4 state registers 900 according to oneembodiment of the present invention. State register i contains bytes 8*ito 8*i+7, inclusively. Registers 4-27 are omitted from FIG. 9 forbrevity but are actually included.

[0124]FIG. 10 is a block diagram of the crypto co-processor 250 inaccordance with one embodiment of the present invention. The cryptoco-processor can include multiple specialized cryptographic (crypto)units 1005, 1010, 1015, 1020, encryption unit 1025 and authenticationunit 1030. Each of the crypto units 1005, 1010, 1015, 1020 can be aspecifically designed arithmetic unit that has been designed andoptimized for a specific type of cryptographic processing. By way ofexample, crypto unit 1005 is designed and optimized for DES processing,while crypto unit 1010 is optimized for AES processing and crypto units1015, 1020 are optimized for yet another type of crypto processing suchas performing hashing functions (e.g., RC4, MD5, SHA-1, SHA-256) or keygeneration functions or other types of crypto processing (e.g., ellipticcurve cryptography). As a result, the corresponding type of crypto unitcan be selected for the type of cryptographic operations required whichthereby speeds the crypto processing. Further, having multiple cryptounits 1005, 1010, 1015, 1020 can allow the crypto co-processor 250 toprocess multiple crypto packets simultaneously (i.e., multi-threading)which can further accelerate the crypto packet processing. Further, ifthe crypto co-processor 250 can process more than one crypto packet at atime, then the CPU 205 will not be stalled if a second crypto packet isreceived before the crypto co-processor has completed processing of afirst crypto packet. This multi-threading capability can also providemore efficient crypto processing for nested crypto processes and forprocessing encrypted data streams.

[0125] IPsec is an abbreviation for IP Security and includes a set ofprotocols developed by the IETF to support secure exchange of packets atthe IP layer. IPsec has been deployed widely to implement VirtualPrivate Networks (VPNs). IPsec,supports two encryption modes: Transportand Tunnel. Transport mode encrypts only the data portion (payload) ofeach packet. The Tunnel mode encrypts both the header and the payload.On the receiving side, an IPsec-compliant device decrypts each packet.For IPsec to work, the sending and receiving devices must share a publickey. This is accomplished through a protocol known as Internet SecurityAssociation and Key Management Protocol (ISAKMP), which allows thereceiver to obtain a public key and authenticate the sender usingdigital certificates.

[0126] There are 4 basic formats of an IPsec packet, relating to AH/ESPand transport/tunnel modes. There are sub-formats based upon IPv4 orIPv6, which we will not detail here. IPv4 is used to illustrate theprocessing; IPv6 is similar. The crypto co-processor 250 will supportparsing of IPv4 and IPv6 options to locate the AH/ESP headers.

[0127]FIG. 11A shows an exemplary format of an IPsec packetencapsulating a TCP datagram for transport mode AH, in accordance withone embodiment of the present invention. Authentication is performedover the complete packet contents.

[0128] Before passing an outbound IPsec packet to the CPU 205, thesoftware 204 processes the original IP datagram and insert theauthentication header with the next header, length, SPI, and sequencenumber fields filled in. The crypto co-processor 250 computes the HMAChash. The crypto co-processor 250 fills in the (typically 96-bit)authentication data field. Authentication covers all fields in thepacket except for the mutable fields in the IP header (TOS, TTL, etc.)and the hash result value in the AH. Crypto co-processor 250 zeroes outthe mutable fields prior to the hash computation. An appropriate hashhardware unit of the crypto co-processor 250 implicitly pads the packetdata as required by the HMAC algorithm. Once the crypto co-processor 250completes processing, the packet is a legal IPsec packet (i.e., nofurther software formatting is required), and the packet may be passeddirectly to network software for further processing (e.g.,fragmentation).

[0129] The crypto co-processor 250 will support other header lengths toallow for different-length authentication data fields. The cryptoco-processor 250 can include one or more hardware units 1005-1030 thatcan compute the length of the authentication data field as the headerlength minus the appropriate default. By way of example, for IPv4,hardware will compute the authentication data field length asHeaderLength−1 32-bit words. The crypto co-processor hardware units1005-1030 does not change the mutable field contents in memory. Thecrypto co-processor hardware units 1005-1030 can be an IPv4 andIPv6-aware hardware state machine. The IPv4 AH length field is 2 lessthan the actual size of the field in 32-bit words.

[0130] One or more of the crypto co-processor hardware units 1005-1030could also fill in the hash in-place without copying the entire packetto a new memory area. The destination address could be the address ofthe authentication data field in the AH, and the crypto co-processor 250hardware would have to be instructed to produce the hash without a copy.Alternatively, the crypto co-processor 250 could parse the packetcontents and determine the appropriate AH offset from the length of theIP header.

[0131] For inbound packets, software 204 programs the control word 212with the address of the packet and the appropriate hash algorithm andkey. The crypto co-processor hardware units 1005-1030 compute the hash,compares it with the hash value in the authentication header field, andreturns status in the control word 212. No copy operation is required.

[0132]FIG. 11B shows an exemplary format of an IPsec packetencapsulating a TCP datagram for tunnel mode AH, in accordance with oneembodiment of the present invention.

[0133] The operations performed by software 204 and the cryptoco-processor hardware units 1005-1030 for this case are analogous totransport mode AH. For an outbound packet, software prepares the packetin memory and passes the source address, HMAC algorithm, and key to thecrypto co-processor 250. Authentication can be done in-place or thepacket can be copied to a new destination. The crypto co-processorhardware units 1005-1030 compute the hash and write the hash result inthe AH.

[0134] For an inbound packet, software 204 prepares a control word 212that specifies the source address of the packet, the HMAC algorithm, andkey. The crypto co-processor hardware units 1005-1030 compute andvalidate the hash, and write the validation status into bit 0 of theHardware Status field of the initial control word.

[0135]FIG. 11C shows an exemplary general packet format for transportmode ESP, in accordance with one embodiment of the present invention.Authentication may be optionally performed over the packet from the ESPheader to the ESP trailer fields, inclusively, while encryption isperformed over the field following the ESP header to the ESP trailerfields, inclusively. FIG. 11D shows a more detailed view of the packetformat starting at the ESP header in accordance with one embodiment ofthe present invention.

[0136] For an outbound packet, the process is similar to AH mode.Software 204 first produces a skeleton for the packet by filling in therequired fields in the packet. This includes the SPI, sequence number,padding, pad length, and next header fields. The padding and pad lengthmust be computed appropriately given the encryption algorithm. Therationale behind requiring software 204 to generate the padding and padlength fields is twofold. The first is to provide flexibility andimproved security, in case software wants to “randomize” the packetlengths to frustrate traffic analysis. The second is to simplify thecrypto hardware portions 1005-1030. If this flexibility is not required,the crypto co-processor hardware units 1005-1030 can generate theminimal required padding and pad length fields prior to encryption, andthen update the IP header length field to account for the padding andpad length fields which were added.

[0137] After filling out the packet skeleton, the software 204 thengenerates a control word 212 that contains the source address of thepacket (perhaps spread across several control words), the authenticationalgorithm, the authentication key, the cipher algorithm, the cipher key,and cipher initialization vector (IV), and the destination address tothe crypto co-processor hardware units 1005-1030. The cryptoco-processor hardware units 1005-1030 compute the encryption startaddress and length as follows. The encryption start address is computedas source address+IP header length (including options, if any)+8; theencryption length is computed as IP_header.len−12−8. The hardware units1005-1030 will encrypt the relevant fields of the packet, and thenauthenticate the relevant fields of the packet, writing the HMAC hash tothe authentication data fields. The hardware units 1005-1030 can encryptand hash in parallel, with the hash running slightly behind encryptionand using the encrypted data. The hardware units 1005-1030 can alsocheck that the encryption length is a multiple of the cipher block size.If the check fails, the hardware units 1005-1030 should terminate theoperation and report the failure.

[0138] An in-place encryption and authentication operation can beperformed if the source data exists in one continuous buffer. Otherwisethe hardware units 1005-1030 will copy the encrypted, authenticatedsource packet to the destination.

[0139] For an inbound packet, software 204 constructs a control word 212that contains the source address, the authentication and decryptionalgorithms, and associated keys and initial vector, for decryption. Thecrypto co-processor hardware units 1005-1030 compute the decryptionoffset and length as above before authenticating and decrypting thepacket, and returning authentication status. Authentication anddecryption can be performed in parallel, with the HMAC using theencrypted data. As for outbound packets, decryption can be performedin-place, or the decrypted, authenticated packet can be copied to a newlocation.

[0140]FIG. 12 shows a general packet format for a tunnel mode ESP,according to one embodiment of the present invention. Operation intunnel mode ESP is substantially similar to transport mode describedabove.

[0141] Tunnel mode bundles (i.e., nested IPsec protocols) are recognizedby software 204 and handled as separate control word queue operations.One supported bundle is transport mode AH followed by ESP. FIG. 13 showsan exemplary packet format of a transport mode bundle in accordance withone embodiment of the present invention. In this mode, ESPauthentication is not used, so there is no ESP authentication trailer.

[0142]FIG. 14 shows an exemplary packet (record) format before SSL/TLSin accordance with one embodiment of the present invention. SSL is anabbreviation for Secure Sockets Layer, a protocol developed by Netscapefor transmitting private documents via the Internet. SSL works by usinga public key to encrypt data that's transferred over the SSL connection.Both Netscape Navigator and Internet Explorer support SSL, and many Websites use the protocol to obtain confidential user information, such ascredit card numbers. By convention, URLs that require an SSL connectionstart with “https:” instead of “http:.”

[0143] TLS is an abbreviation for Transport Layer Security, a protocolthat guarantees privacy and data integrity between client/serverapplications communicating over the Internet. The TLS protocol is madeup of two layers: the TLS record protocol and the TLS handshakeprotocol.

[0144] The TLS record protocol is layered on top of a reliable transportprotocol, such as TCP, it ensures that the connection is private byusing symmetric data encryption and it ensures that the connection isreliable. The TLS Record Protocol also is used for encapsulation ofhigher-level protocols, such as the TLS Handshake Protocol. The TLShandshake protocol allows authentication between the server and clientand the negotiation of an encryption algorithm and cryptographic keysbefore the application protocol transmits or receives any data.

[0145] TLS is application protocol-independent. Higher-level protocolscan layer on top of the TLS protocol transparently. Based on Netscape'sSSL 3.0, TLS supercedes and is an extension of SSL.

[0146]FIG. 15 shows an exemplary encrypted packet format for SSL/TLS inaccordance with one embodiment of the present invention. The ciphertextcan be 2 KB larger than the plaintext. More specifically, the ciphertextis generated by (optionally) using the HMAC function to produce a hashof the original data field that is appended to the data field, thenencrypting the data and optional hash. The HMAC can require a sequencenumber in addition to a key. The data and the MAC can then be encrypted.In the case of a block cipher, the data and MAC may be padded with up to255 bytes of padding. A final byte can specify the length of the paddeddata (exclusive of the pad length byte). FIG. 16 shows an exemplaryencrypted packet format with the final byte defining the length of thepadded data, in accordance with one embodiment of the present invention.

[0147] For an outbound packet, software 204 prepares a source area inmemory with the packet format as above. The software 204 fills in thepadding and pad length fields, taking care that the resulting totallength is a multiple of the block cipher size, and sets the Length fieldto be the total data+MAC+Padding+1. Similar to ESP mode IPsec, forcingsoftware to generate the packet padding allows flexibility andsimplifies the hardware units 1005-1030 of the crypto co-processor 250.

[0148] The hardware units 1005-1030 compute the HMAC over the type,major, minor, length, and data fields (the length of data for HMAC iscomputed as Length−(Pad length+1+MAC_length) for TLS and length−(padlength+1+MAC_length+2) for SSL, where MAC_length is the length of theresulting hash). The hardware portions 1005-1030 also compute theencryption offset as (source address+5) and the encryption length isLength. The hardware portions 1005-1030 also check to make sure theencryption length is a multiple of the cipher block size. The hardwareportions 1005-1030 then encrypt the data, MAC, padding, and pad lengthfields. The hardware units 1005-1030 overlap as much of the HMAC andencryption as possible.

[0149] Operation is similar for a copy operation, but software 204 maysplit the source data across multiple control words. In at least oneembodiment the destination buffer is contiguous. For an inbound packet,software 204 provides the crypto co-processor 250 with the address ofthe packet, cipher and authentication algorithm, keys, andinitialization vectors. For in-place decryption, hardware units1005-1030 will decrypt the data, MAC, and padding fields, thenauthenticate the packet. Since the MAC is encrypted and performed on thecleartext data, authentication may be required to wait for thedecryption to complete. The hardware units 1005-1030 will overlap asmuch of the decryption and authentication as possible. Operations aresimilar for a copy decryption/authentication.

[0150] XML encryption or XML signature does not require any specialsupport for XML features. The encryption and authentication algorithmscan be accessed directly via the control word queue 210 and/or theinterface 215. The following are examples of XML encryption/signaturealgorithms that can be supported:

[0151] SHA-1 (XML Signature, XML Encryption)

[0152] SHA-256 (XML Encryption)

[0153] Triple DES (XML Encryption)

[0154] AES-128 (XML Encryption)

[0155] AES-192 (XML Encryption)

[0156] AES-256 (XML Encryption)

[0157] The crypto co processor 250 can also support MPA framing andde-framing. The first byte of the source data must point to the firstbyte of the ULPDU (outbound data) or FPDU (inbound data). For outboundULPDUs, the crypto hardware units 1005-1030 will support the followingoperations:

[0158] Given the starting byte offset into the ULPDU of the first markerposition and the initial marker value, insert 4 B markers into theULPDU.

[0159] Based upon the ULPDU length, insert 0-3 bytes of padding asnecessary.

[0160] Calculate the CRC32c.

[0161] Create the new FPDU. This may entail a copy operation or it maybe done in-place. For an in-place operation, software 204 must haveallowed sufficient room for padding, the markers, and the CRC.

[0162] For inbound FPDUs, the crypto hardware units 1005-1030 willsupport the following operations:

[0163] Calculate and validate the CRC32c. The crypto hardware portions1005-1030 will signal an error if there is a CRC mismatch. Given theinitial position of the marker relative to the start of the FPDU,validate that all markers contained in the FPDU have appropriate values.The crypto hardware units 1005-1030 will examine the FPDU length andremove padding bytes as necessary. After removing the markers, padding,and CRC from the FPDU to form the ULPDU, the crypto hardware units1005-1030 will copy the ULPDU to the destination.

[0164] The crypto hardware units 1005-1030 can also support calculationand validation of the TCP or UDP payload and IPv4 header checksums. Thecrypto hardware units 1005-1030 can either perform just the IPv4 headerchecksum calculation or the TCP/UDP payload checksum in addition to theIPv4 header checksum. For an outbound IP datagram, hardware will beprogrammed with the initial byte of the IP header. For IPv4, the cryptohardware units 1005-1030 will parse the header and process options, tolocate the start of the TCP or UDP header. The crypto hardware units1005-1030 will parse the TCP or UDP header to determine the startingpoint and length of the TCP or UDP payload. Using the pseudo-header, thecrypto hardware units 1005-1030 will then calculate the TCP/UDP checksumand place the result in memory at the destination address. The cryptohardware units 1005-1030 will then compute the IP header checksum.

[0165] For an inbound IP datagram, the crypto hardware units 1005-1030can also support calculation and validation of the TCP/UDP payload andIPv4 header checksums independently. Again, crypto hardware units1005-1030 will support parsing of IPv4/v6 and TCP headers with options.

[0166] The crypto hardware units 1005-1030 can also perform the IPv4header checksum, TCP/UDP payload checksum, and MPA FPDU framingoperation in a one-pass operation. For an outbound datagram containingan MPA ULPDU, crypto hardware units 1005-1030 will parse the IP, TCP,and ULPDU header and add padding bytes, markers, and CRC32c, beforecalculating the TCP/UDP payload checksum and IP header checksum. For aninbound datagram containing an MPA FPDU, crypto hardware units 1005-1030will validate both the IP header checksum, and the TCU/UDP payloadchecksum, before validating the MPA CRC32c, validating the markers, andfinally removing the CRC32c, markers, and padding bytes. The cryptohardware units 1005-1030 typically will not alter the IP payload lengthfield or adjust TCP sequence numbers as these fields are typicallyprecalculated or post-calculated by software 204.

[0167] The crypto co-processor 250 can also support SSL/TLS session keygeneration. For SSL. The crypto co-processor 250 will support thefollowing operations:

[0168] Software 204 will provide crypto hardware units 1005-1030 withthe starting byte address of the source data, which will consist of the48-byte pre-master secret, followed by the 32-byte ClientHello.randomand the 32-byte ServerHello.random data. The crypto hardware units1005-1030 can then compute the 48-byte master secret and place it in thedestination, according to the following formula:

MD5(pre_master_secret∥SHA-1(‘A’∥pre_master_secret∥ClientHello.random∥ServerHello.random))∥

MD5(pre_master_secret∥SHA-1(‘BB’∥pre_master_secret∥ClientHello.random∥ServerHello.random))∥

MD5(pre_master_secret∥SHA-1(‘CCC’∥pre_master_secret∥ClientHello.random∥ServerHello.random))

[0169] Software 204 will also provide crypto hardware units 1005-1030with a length, in bytes, of the desired key block. The crypto hardwareunits 1005-1030 can also iterate using the following formula untilenough output has been produced:

MD5(master_secret∥SHA-1(‘A’∥master_secret∥ServerHello.random∥ClientHello.random))∥

MD5(master_secret∥SHA-1(‘BB’∥master_secret∥ServerHello.random∥ClientHello.random))∥

MD5(master_secret∥SHA-1(‘CCC’∥master_secret∥ServerHello.random∥ClientHello.random))∥

[0170] The second operation may be chained with or performed separatelyfrom the first operation. If chained, the crypto hardware units1005-1030 will concatenate the master secret with the key block.

[0171] For TLS, the crypto hardware units 1005-1030 can compute themaster secret and key block. In order to compute the master secret,software 204 provides the crypto hardware units 1005-1030 with thepre-master secret and Server and Client random values as for SSL. Thecrypto hardware units 1005-1030 computes the master secret as followsuntil enough output (in this case, 48 bytes) have been produced:

PRF(pre_master_secret,“mastersecret”,ClientHello.random∥ServerHello.random)

[0172] where the PRF is defined as:

P_MD5(pre_master_secret[47:24],“master_secret”∥ClientHello.random∥ServerHello.random)XORP_SHA-1(pre_master_secret[23:0],“master_secret”∥ClientHello.random∥ServerHello.random)with P_MD5/P_SHA-1 defined as:

H(0)=HMAC _(—) MD5/SHA-1(secret, seed)

H(i)=HMAC _(—) MD5/SHA-1(secret, H(i-1)∥seed)

[0173] The software 204 may also have the crypto hardware units1005-1030 compute the TLS key block by providing hardware the sourceaddress of the master secret, the server random value, client randomvalue, and the desired length of the key block. The crypto hardwareunits 1005-1030 will iterate over the PRF using the following formulauntil enough output has been produced:

PRF(master_secret, “key expansion”,SecurityParameters.server_random∥SecurityParameters.client_random)

[0174] The second operation may be chained to the first. In this case,software 204 must store the pre-master secret and random values at thesource location

[0175] As used herein in connection with the description of theinvention, the term “about” means ±10%. By way of example, the phrase“about 250” indicates a range of between 225 and 275. With the aboveembodiments in mind, it should be understood that the invention mayemploy various computer-implemented operations involving data stored incomputer systems. These operations are those requiring physicalmanipulation of physical quantities. Usually, though not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, transferred, combined, compared, and otherwisemanipulated. Further, the manipulations performed are often referred toin terms, such as producing, identifying, determining, or comparing.

[0176] Any of the operations described herein that form part of theinvention are useful machine operations. The invention also relates to adevice or an apparatus for performing these operations. The apparatusmay be specially constructed for the required purposes, or it may be ageneral-purpose computer selectively activated or configured by acomputer program stored in the computer. In particular, variousgeneral-purpose machines may be used with computer programs written inaccordance with the teachings herein, or it may be more convenient toconstruct a more specialized apparatus to perform the requiredoperations.

[0177] The invention can also be embodied as computer readable code on acomputer readable medium. The computer readable medium is any datastorage device that can store data that can thereafter be read by acomputer system. Examples of the computer readable medium include harddrives, network attached storage (NAS), read-only memory, random-accessmemory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical andnon-optical data storage devices. The computer readable medium can alsobe distributed over a network coupled computer systems so that thecomputer readable code is stored and executed in a distributed fashion.

[0178] It will be further appreciated that the instructions representedby the operations in FIG. 3 are not required to be performed in theorder illustrated, and that all the processing represented by theoperations may not be necessary to practice the invention. Further, theprocesses described in FIG. 3 can also be implemented in software storedin any one of or combinations of the RAM, the ROM, or the hard diskdrive.

[0179] Although the foregoing invention has been described in somedetail for purposes of clarity of understanding, it will be apparentthat certain changes and modifications may be practiced within the scopeof the appended claims. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

What is claimed is:
 1. A method of processing a cryptographic packetcomprising: receiving a first cryptographic packet in a host CPU;identifying a first set of data required to execute the firstcryptographic packet; transferring the first cryptographic packet andthe required first set of data to a cryptographic co-processor;executing the first cryptographic packet in the cryptographicco-processor; notifying the host CPU that the execution of the firstcryptographic packet is complete; and receiving the executed firstcryptographic packet in the host CPU.
 2. The method of claim 1, whereinidentifying the first set of data required to execute the firstcryptographic packet includes identifying the required first set of datain a first control word.
 3. The method of claim 2, wherein the controlword includes instructions for the crypto co-processor.
 4. The method ofclaim 1, wherein transferring the first cryptographic packet and therequired first set of data to the cryptographic co-processor includestransferring the first cryptographic packet and the required first setof data through a control queue.
 5. The method of claim 4, whereinidentifying the first set of data required to execute the firstcryptographic packet includes identifying the required first set of datain a first control word and wherein the first control word is located inthe control queue.
 6. The method of claim 5, wherein the first controlword identifies a first storage location of the first cryptographicpacket and a second storage location of the required first set of data.7. The method of claim 5, wherein notifying the host CPU that theexecution of the first cryptographic packet is complete includesmodifying a field in the first control word.
 8. The method of claim 7,wherein modifying the field in the first control word includesidentifying a third location of an execution result of the executedfirst cryptographic packet.
 9. The method of claim 8, wherein receivingthe executed first cryptographic packet in the host CPU includes thehost CPU retrieving the execution result from the third locationidentified by the first control word.
 10. The method of claim 1, whereintransferring the first cryptographic packet and the required first setof data to the cryptographic co-processor includes: receiving asubsequent packet in the host CPU; and executing the subsequent packetin the host CPU.
 11. The method of claim 10, wherein the subsequentpacket is a second cryptographic packet and executing the subsequentpacket in the host CPU includes: identifying a second set of datarequired to execute the second cryptographic packet; transferring thesecond cryptographic packet and the required second set of data to thecryptographic co-processor; executing the second cryptographic packet inthe cryptographic co-processor; notifying the host CPU that theexecution of the second cryptographic packet is complete; and receivingthe executed second cryptographic packet in the host CPU.
 12. The methodof claim 11, wherein the second cryptographic packet is executed in thecryptographic co-processor substantially in parallel with executing thefirst cryptographic packet.
 13. The method of claim 11, wherein thesecond cryptographic packet is executed in the cryptographicco-processor in series with executing the first cryptographic packet.14. The method of claim 10, wherein the subsequent packet is a secondcryptographic packet and executing the subsequent packet in the host CPUincludes: identifying a second set of data required to execute thesecond cryptographic packet; transferring the second cryptographicpacket and the required second set of data to the cryptographicco-processor via an interface; executing the second cryptographic packetin the cryptographic co-processor substantially in parallel withexecuting the first cryptographic packet; notifying the host CPU thatthe execution of the second cryptographic packet is complete; andreceiving the executed second cryptographic packet in the host CPU. 15.The method of claim 1, wherein notifying the host CPU that the executionof the first cryptographic packet is complete includes sending aninterrupt request to the CPU.
 16. A microprocessor comprising: a hostCPU; a cryptographic co-processor; and a control queue coupled to thehost CPU and the cryptographic co-processor.
 17. The system of claim 16,wherein the cryptographic co-processor includes a plurality of hardwareunits.
 18. The system of claim 17, wherein the plurality of hardwareunits includes one or more crypto units that are optimized to perform aselected encryption process.
 19. The system of claim 16, wherein thecontrol queue is a storage location in the microprocessor.
 20. Thesystem of claim 16 further comprising an interface coupled between thehost CPU and the cryptographic co-processor.
 21. The system of claim 20,wherein the interface is capable of transferring an instruction from thehost CPU to the cryptographic co-processor.
 22. The system of claim 20,wherein the interface includes a plurality of hardware registers.
 23. Amethod of processing a cryptographic packet comprising: receiving afirst cryptographic packet in a host CPU; identifying a first set ofdata required to execute the first cryptographic packet in a firstcontrol word, the first control word being located in a control queueand includes: identifying a first storage location of the firstcryptographic packet; and identifying a second storage location of therequired first set of data; transferring the first cryptographic packetand the required first set of data to a cryptographic co-processor;executing the first cryptographic packet in the cryptographicco-processor; modifying a field in the first control word to notify thehost CPU that the execution of the first cryptographic packet iscomplete including: identifying a third location of an execution resultof the executed first cryptographic packet; and retrieving the executionresult, wherein the host CPU retrieves the execution result from thethird location identified by the first control word.